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^5 (57) Abstract: A method of identifying available time slot(s) in a packet switch in order to route a packet from an input port to 
a designated output port, the method comprising the steps of: logically combining the status of each time slot with regard to the 

° input port and with regard to the output port to generate the status of the time slot with regard to the given input port-output port 

O Pair logically combining in pairs the status of input port-output, port pairs to determine whether one of the input port-output port 

>> pairs is available and repeating the step of pair-wise logically combining the status of input port-output port pairs to determine input 

^ port-output port pair availability until one available input port-output port has been identified. 
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PACKET SWITCHING 



This invention relates to packet switching (or cell switching), in particular 
methods for a.locating requests for switching from one of the inputs of a packet sw.tch 
5 to one of the outputs of the packet switch and methods of fabric allocation with.n a 
packer switch. 

,nput-buffered ceil switches and packet routers are potentially the h.ghest 
possible bandwidth switches for any given fabric and memory technologies, but such 
devices require scheduling algorithms to resolve input and output contends. Two 
10 approaches to packet or cel. scheduling exist (see, for example, A Hung * a/, "ATM 
input-buffered switches with the guaranteed-rate property," and A Hung et af. Proc. 
,EEE ISCC "98, Athens, Jul 1998, pp 331-335). The first approach appl.es at the 
connection-ievel, where bandwidth guarantees are required. A suitable algorithm must 
satisfy two conditions for this; firstly it must ensure no overbooking for all of the .nput 
15 ports and the output ports, and secondly the fabric arbitration problem must be solved 
by allocating all the requests for time slots in the frame. 

According to a first aspect of the invention there is provided a method of a 
method of allocating switch requests within a packet switch, the method comprismg 
the steps of 

20 (a) logically combining the status of each time slot with regard to the .nput port 
and with regard to the output port to generate the status of the t,me slot with regard to 
the given input port-output port pair; 

(b ) logically combining in pairs the status of input port-output port pa.rs to 
determine whether one of the input port-output port pairs is available; and 
25 (c) repeating step (b) iteratively until one available input port-output port has 

been identified. 

According to a second aspect of the invention there is provided a method of 
allocating switch requests within a packet switch, the method comprising the steps of; 

(a) logically combining the status of each time slot with regard to the input port 
30 and with regard to the output port to generate the status of the time slot with regard to 

the given input port-output port pair; and 

(b) processing the inlet-outlet pair status information generated in step (a) in a 
,ogica. concentrator so that the available input port-output port pair(s) are ordered 
hierarchically. 
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According to a third aspect of the invention there is provided a method of 
allocating switch requests within a packet switch, the method comprising the steps of; 

(a) establishing switch request data at each input port; 

(b) processing the switch request data for each input port to generate request 
5 data for each input port-output port pairing; 

(c) comparing the number of requests from each input port and to each output 
port with the maximum request capacity of each input port and each output port; and 

(d) allocating all requests for those input-output pairs where the total number of 
requests is less than or equal to the maximum request capacity of each input port and 

10 each output port; 

(e) reducing the number of requests for those input-output pairs where the 
total number of requests is greater than the maximum request capacity of each input 
port and each output pert such that the number of requests is less than or equal to the 
maximum request capacity of each input port and each output port; and 
15 (f) allocating the remaining requests. 

The present invention additionally provides a method of allocating switch 
requests within a packet switch, the method comprising the steps of; 

(a) establishing switch request data at each input port; 

(b) processing the switch request data for each input port to generate request 
20 data for each input port-output port pairing; 

(c) allocating a first switch request from each of the input port-output port pairing 
request data, the requests being allocated only if the maximum request capacity of 
the respective output port has not been reached; and 

(d) allocating further switch requests by the iterative application of step (c) until 
25 the maximum request capacity of each output port has been reached. 

The invention will now be described with reference to the following figures in 

which; 

Figure 1 is a schematic depiction of a three stage switch; 
30 Figure 2 is a schematic depiction of an apparatus according to the present 

invention; 

Figure 3 is a schematic depiction of a half-matrix concentrator according to 
the present invention; 
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Figure 4 is a schematic depiction of a concentrator according to the present 

,nVent ' 0n Figure 5 is a depiction of a (8,8) concentrator according to the present 

5 mVent,0n Fig ure 6 is a schematic depiction of a concentrator according to a second 
embodiment of the present invention; 

Figure 7 is a schematic depiction of a (7,4) concentrator acccrd.ng to the 

second embodiment of the present invention, 

Figure 8 is a schematic depiction of a (8,8) concentrator according to a third 

10 embodiment of the present invention; fourt h 
Figure 9 is a schematic depiction of a (8,8) concentrator accord.ng to a fourth 

embodiment of the present invention; 

Figure 10 is a s« depic.cn cf en apparatus .cr de.erm,n,ng 
sequentially the address of a common middle-stage switch: 
„ Figure 11 is a schemetic dapMon of an apparatus for preventing the 

overbooking of output ports according to the present invention; and 

Figur. 12 is a schematic depiction of a processor for use ,n detemwrmg 
parallet me address o. a common middle-stage switch according to .he present 
invention. 

20 Figure 1 shows a three-srage switch 100 wh,ch comprises a plurality of first 

stage switches 110. a plurality of middle stage sw,.ches 120 and a plurality P. ~d 
stage switces 130. Each o, the «rs« stage switches is connected to a„ of the m, d e 
stage switches, and a» o, the middle stage switches are connected to each of the hird 
25 stage switches so that any first stage switch can be connected to any third stage 
w^h v,a any o, the middle stage switches. The .hree-stage swttch ,00 has p rst 
sta ge switches 110 and p third stage swttches 130, each of which is connected to ^n 
ports (inpu, pons for the flrs, stage switches 110 and output pons tor me M sage 
witches 130). me ,o,e, capacity of the three-stage switch is N inpu, ports and N 
30 output ports, where N=n x p. The number of middle stage switches ,s m. where for a 
JJL, switch m=2n-1. Each tirs, stage switc has m ou«ets to « - -~ 
switches, one outlet being connected to eech middle stage switch. o,m„ah . each 
third stage switch has m inlets from the middle stage switches, one ,n,et beng 
connect from eat* middle stage switch. (Although me tollowtng discussion 
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assumes the use of a non-blocking switch, the present invention may also be used 
with lower values of m, as long as m remains greater then or equal to n.) 

The inventor has had the insight that techniques that can be applied to the 
setting-up of connections between the input ports and the output ports of circuit-based 
5 switches may also be applied to packet-based (or cell-based) switches. 

If a single call processor is employed to control connection set-ups in the 
symmetric three-stage packet switch of Figure 1, the maximum number of processing 
steps required to find a free, common middle-stage switch through which to connect 
one of the N input ports i to one of the N output ports j is essentially the number of 

10 outlets and inlets, m, on the first and third stage switches A 2 and C3 respectively. In 
the prior art it is known to compare sequentially the statuses of every outlet/inlet pair 
attached to the same middle-stage switch, stepping through the middle-stage 
switches, until the first pair is found that are both unused. To make N connections 
across the switch therefore requires a maximum of O(Nm) processing steps. For a 

15 strictly non-blocking Clos switch with m=2n-1, and for which n * (N/2) 1 ' 7 to minimise the 
number of matrix crosspoints needed to interconnect a total of N ports, this results in 
0(N 3/2 ) computing steps. 

Figure 2a shows an apparatus 200 embodying a binary logic tree that allows 
the above processing to be performed sequentially in order to find the first free, 

20 common middle-stage switch more quickly. The logic tree comprises outlet status 
array 210 and inlet status array 220, each status array having m elements. The outlet 
status array 210 contains the status of the m outlets connected to one of the first 
stage switches A,, and the inlet status array 220 contains the status of the m inlets 
connected to one of the third stage switches Cy. A '0' entry in the array indicates that 

25 the outlet (or inlet) is in use and a '1' entry in the array indicates that the outlet (or 
inlet) is available to be used. The two status arrays are logically compared using an 
array of m AND gates 230, such that a logic '1' output is produced only when both an 
outlet and its corresponding inlet are free. The statuses of all outlet and inlet pairs are 
made available simultaneously using parallel logic gates. 

30 One of the resulting free inlet/outlet pairs (there may be only one such pair) is 

selected by comparing the outputs from the AND gate array 230 using the array of 
binary comparison elements 240. Figure 2b shows the structure of such a binary 
comparison element 240 and Figure 2c shows the truth table for the binary 
comparison element 240. If only one of the inlet/outlet pairs is available, then the 
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binary comparison element 240 will pass this availability data and the addresses of 
the availab.e inlet and outlet pair (the addresses may be passed either sequentially or 
in parallel and has the form of log 2 m bits) to the next stage of comparison us.ng a 
binary comparison element 240. If both sets of inlet/outlet pairs are free. then, for 
5 example with the combination element and truth table shown in Figures 2a & 2b. the 
inlet/outlet pair on the uppermost input of the binary comparison element is chosen as 
the output. Thus, after log 2 m stages of comparisons, the address of a free inlet/outlet 
pair being switched to the output of the binary tree 200, which should be the 
uppermost' free pair in the two status arrays 210 & 220. 
! o Assuming that the address bits are switched in a parallel bus, the number of 

computing steps needed to find the first free middle-stage switch is simply the number 
of binary tree stages, i.e. log 2 (2m). Thus, the total computing time for N connects » 
0(Nlog 2 (2m)), which for the above assumptions results in 0(N.og 2 N) computing steps. 
As an alternative to the above method it is possible to handle connects 
15 requests in parallel rather than sequentially, so that the paths for N connections can 
be computed simultaneously. A number of connection requests could be collected, 
up to the full throughput of the switch N, which could then be processed together. In 
this approach, all the connections wanted between a pair of first and th.rd stage 
switches e.g. A 2 and C 3 in Figure 1, are computed simultaneously. Every first stage 
20 switch has a processor associated with it (alternatively they could be associated with 
the third stage switches) so there are N/n (or p) outer-stage processors. Each of these 
processors must interrogate each of the third stage switches in turn (N/n [or p] of 
them) and for each one find up to n free common inlet/outlet pairs to m.dd.e-stage 
switches, depending on the precise number of connections required between that pa.r 
25 of switches. A/C,. To do this it must interrogate the status of all m middle-stage 
switches. Beginning, for example, with AVd. it will be possible for first and third 
stage switches A 2 /C 2 to compute their set of common, free middle-stage switches at 
the same time, and this parallel processing holds for all A//Cy. When all these parallel 
computations have been completed, the first and third stage switch pairings are 
30 cycled around by one. so that all A,7C /+f switch pairings are computed in parallel. Th.s 
continues until all N/n input switches have been paired with all N/n third stage 
switches, taking N/n steps. For each pairing of first and third stage switches, .t .. 
necessary to compute up to n separate connections. 
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Once all of the possible connections between a given first stage switch and a 
given third stage switch have been established, one of these connections will be 
selected and the connection made between the first stage switch and the third stage 
switch. 

5 Figure 3a shows a concentrator 300 that can compute these connections in 

parallel, packing the addresses of all free, common middle-stage switches together 
onto its output lines. Concentrator 300 comprises outlet status array 310, inlet status 
array 320, AND gate array 330 and an array of binary comparison elements 340. 
Figure 3b shows the structure of the binary comparison element 340 and Figure 3c 

10 shows the truth table for the binary comparison element 340. The outlet status array 
310 and the inlet status array 320 each have m elements, the elements having a T 
status to indicate that the associated middle stage switch is available and a '0' status 
to indicate that the associated middle stage switch is in use. The two status arrays 
310 & 320 are logically compared using the array of m AND gates 330, such that a 

15 logic 'V output is produced only when both an outlet and its corresponding inlet are 
free. The binary comparison elements 340 process the outputs of the AND gate array 
330 so that all of the '1' logic states are grouped together on the uppermost output 
lines, along with the address of the associated middle-stage switch. The apparatus of 
Figure 3 will guarantee to find up to n free switches in 3n-3 stages, using 3n(n-1)/2 

20 comparison elements. The total number of computing steps that results is O(N) and 

the number of comparison elements required is 0(N 3/2 ) using the previous optimum 
switch design. This allows all N connections through a 3-stage packet switch to be 
computed in at most linear time. 

Figure 4 shows an alternative to the concentrator shown in Figure 3. The 

25 multi-stage concentrator 400 shown in Figure 4 has outlet status array 310, inlet 
status array 320 and AND gate array 330, as shown in Figure 3. The outputs of the m 
AND gates are fed into an (m,m) concentrator 490, which comprises m/2 binary 
combination elements 340 (as shown in Figures 3b & 3c), two (m/2, m/2) 
concentrators 450, and an mxm rearrangeable switching network 460. Figure 5 

30 shows how the (m/2, m/2) concentrator 450 and the mxm rearrangeable switching 
network 460 may be constructed using a plurality of binary combination elements 340 
for m=8. The concentrator 490 has an iterative structure, consisting of three parts. 
The first is a single input stage of comparison elements 340, which feed the second 
part, which consists of two (m/2, m/2) concentrators 450, i.e. two half-concentrators. 
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switching 460 (or " ffee Mmmon middle . sta ge switch 

5 example o. an (m.m) concentrator ,s to pack me f _ s(iows an 

addresses into the uppermost output lines o, the * 
example o, an (3.8, concentrator, which .or «*c*.N- £~ - ^ ^ 
and output ports a.b.e and 9 of each companson etemen. are 
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Fiaure 6 depicts an alternative embod.ment of the pre 

re ma,nder o, the multistage concentrator 800 ,s as «~- 

show, .n Figure 4. Ana„sis tnd.oa.es that mergm ne^s red ^ 
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„ can be shown that the number of stages needed 
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concentrator is O(m<log,m> ). < _ „ „ 

OtrAlogM'). The tout number of companson elements ,s 0((N 2)., 
Tan opium 3-s,age pacKe, switch wtth n . Is 0<N«- ^ ^ 
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s trt c, non-blocking 3-s.age switch. Because - 
concentrator acfuatf, reguires on,, 7 inputs and 4 output s .e 
could all come from me outputs of the top half-concentraror. 
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between the top and bottom half-concentrators, but never all through the bottom one 
(as it only has three inputs). Consequently, the top half-concentrator must have a full 
(((m+1)/2),((m+1)/2)) structure, i.e. (4,4), but the bottom one has one less input and, in 
this case, two fewer outputs, and so it need only be a (3,2) concentrator. Figure 7 
5 shows the resulting (7,4) concentrator 900 structure, which comprises 3 binary 
comparison elements 340, a (4,4) concentrator 910, a (3,2) concentrator 920 and a 2 
stage merging network 930. The (4,4) concentrator 910, (3,2) concentrator 920 and 
the 2 stage merging network 930 are all formed from a plurality of binary comparison 
elements 340. 

10 If the log 2 (m/2) logic stages of the merging network part of a concentrator 

were to be replaced by a single logic step, then the iterative procedure for 
constructing large concentrators out of smaller concentrators would no longer grow as 
0((log 2 m) 2 ), but as 0(log 2 m), i.e. in a similar fashion to the iterative procedure for 
constructing large permutation networks. Figure 8a shows an example of the structure 

15 of an (8,8) concentrator. The concentrator 1000 comprises four binary comparison 
elements 1010, two (4,4) half concentrators 1020 and a two stage merging network 
1030 formed from five switching elements 1030A, 1030B, 1030C, 1030D & 1030E. 
Figure 8b shows the structure of binary comparison element 1010 and Figure 8c 
shoes the truth table of the binary .comparison element 1010. Binary comparison 

20 element 1010 comprises an OR gate 1011, a 2x2 switch 1012 and an AND gate 1013. 
Inputs a and b carry data regarding the availability of middle stage switches, whilst 
inputs c and d carry the address of the respective middle stage switch (Figure 8a 
shows only the data buses and omits the address buses for the sake of clarity. Each 
(4,4) half concentrator 1020 comprises 5 binary comparison elements 1010 

25 configured as shown in Figure 8a. The two half-concentrators produce eight data 
outputs 1021 -1028, with 1021 being the uppermost output and 1028 the lowermost 
output as shown in Figure 8a. The two stage merging network 1030 is formed from 
five switching elements 1030A-1030E. Figure 8d shows the structure of switching 
elements 1030A-1030E and Figure 8e shoes the truth table of the switching elements 

30 1030A-1030E. Switching elements 1030A-1030E comprise a NOT gate 1031, and two 
2x2 switch 1032 & 1033. Inputs a and b carry data regarding the availability of middle 
stage switches, whilst inputs c and d carry the address of the respective middle stage 
switch. The control signals applied to the NOT gate of switching elements 1030A- 
1030E are taken from the data outputs of the half-concentrators. 
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The concentrator 1000 differs from the previous (8,8) concentrator structures 
by using some of the half-concentrator outputs as control inputs for the two stages of 
switching elements in the merging network. The switching elements are no longer 
used as logic e.ements, but the control inputs from the half-concentrators set the 
5 states of the 2x2 switches. These can all be set simultaneously by their control inputs, 
so there is only one logic "toggle" delay for all switches. The half-concentrator output 
,ogic states and addresses now propagate through the previously set sw-tches. 
incurnng no more "toggle" delays through the multiple stages of the merging network 
but only propagation delays and any bandwidth narrowing through each sw»tch. If the 
10 logic and switch e.ements are implemented electronically, e.g. with transistors, where 
propagation delays may be small due to chip integration, benefits may be ga.ned ,f 
bandwidth narrowing per switch stage is less than the equivalent inverse • toggle 
delay of a logic gate. If all-optical interference switches are considered for the 
implementation technology, the bandwidth narrowing can be extremely small per 
15 stage (e g. around 10« Hz bandwidth is possible per stage using psec pulses), wh.le 
the "toggle" rate may be far lower (e.g. 10 Gbit/s). This apparently enormous benefit 
of all-optica, switching will of course be offset by the relatively long propagat.cn 
delays, due to lower levels of integration in optics, but these propagation delays w,l. 
decrease as optical integration technology advances. 
20 Figure 8a shows all the possible permutations of logic states at the half- 

concentrator outputs 1021-1028. There are far less permutations at this location ,n the 
concentrator than at its overall inputs. At the half-concentrator outputs, the logic 1s 
representing free, middle-stage switch addresses are concentrated or packed ,nto the 
uppermost output outputs of each of the two half-concentrators. Now these two sets 
25 of rs must still be packed together, side by side, but there can be outputs of the top 
half-concentrator, between these two sets of rs. that are in the 0 log.c st 
However because the r. from the top half-concentrator are packed together ths 
means that any 0's are also packed together. So each of the m/2 - 1 top half- 
concentrator outputs that could be separating logic rs can be used to control one o 
30 the merging network switches. Let us start by considering the ^T^nZZ 
this is in the 0 state, we want to switch all possible f s in outputs 1025. 1026 and 1027 
up by one output, i.e. to outputs 1024, 1025 and 1026. This is achieved by control ng 
switches 1030B and 1030E by output 1024, such that they are switched to the 
crossed state when output 1024 is in the 0 state. When the logic 0 from output 1024 
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now propagates through switch 1030B t it will be routed out of the way, closing the gap 
between the two sets of 1's if there is only one logic 0 on output 1024. But if there is 
also a logic 0 on output 1023, from the permutations shown in Figure 8, we therefore 
need at most to switch outputs 1025 and 1026 up to outputs 1023 and 1024. Output 
5 1023 must therefore control switches 1030A and 1030D. Switch 1030B has already 
been taken care of (controlled) by output 1024's logic 0. The fact that switch 1030E is 
also crossed does not matter, since when there are two logic 0's, in outputs 1023 and 
1024, it will simply swap over the logic 0's on its outputs, which makes no difference. 
The two logic 0's are now removed from between the sets of logic Vs. If output 1022 

10 is also a logic 0, so there are three logic 0's together on outputs 1022 t 1023 and 
1024, then we need to do no more than switch output 1025 to output 2. So output 
1022 should control switch 1030C. All other switch settings 1030A, 1030B, 1030D and 
1030E don't matter. In this way three logic 0's will be removed, enabling the two Vs to 
be together on outputs 1021 and 1022. There is one additional problem that when 

15 output 1024 is in the 1 state, all switches 1030A - 1030E will be in the through state, 
and so the crossed wiring pattern between lines 1024 and 1025 will cause a 0 on 
output 1025 to be raised to output 1024 (thus creating a new gap between the logic 
1s). There is only one permutation out of the 15 where this happens. This is solved, 
when output 1025 is in the 0 state, by using this output as a second control input to 

20 switch 1030D, in order to allow the logic 1 from output 1024 to be raised back to 
output 1024. Switch 1030D will be in the crossed state if either output 1023 or output 
1025 are in the 0 state. 

The number of stages of switching elements remains as above, but the 
number of logic steps can be reduced. Let us assume the number of logic steps for a 

25 (4,4) concentrator is 3 (as before). The number of steps for an (8,8) concentrator will 
be S(8,8) = 1 + S(4,4) +1=2 + S(4,4) because a concentrator requires two half- 
concentrators sandwiched between a left-hand logic stage and a right-hand merging 
network. Similarly, S(16,16) = 1+ S(8 t 8) +1=2 + 2 + S(4,4) and S(32,32) = 1 + 
S(16,16) +1=2 + 2 + 2 + S(4,4), so S(m t m) = 2log 2 (m/4) + S(4,4) = 2log 2 m - 1. (If 

30 additional regeneration were required after every merging network within a 
concentrator, then S(m,m) = 3log 2 m - 3 stages). The total number of computing steps 
required is 0((2N/n)log 2 n) which for an optimum 3-stage packet switch with n * (N/2) 1/2 
is 0(N 1/2 !og 2 N). The total number of logic and switching elements is the same as when 
the merging networks are used as multiple stages of logic elements, i.e. 0(N(log 2 N) 2 ). 
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By careM, o« me linx pattern between the 

nlwork it is possibie to reduce the logic complexity to such an extent that ,t 

b u. tne embodiment shown in Figure 9 and described below is one example o, uch a 
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,0 ZZ, L earra;, ova, a„ me pennons. The order ,s ,021. 1022. 1025. 
X 4. 026, ,027. 1028. Figure Sa shows the linK pane. i be^een a 
concentrator outputs and ate merging network re-amangad in this order, .or an 8 8 
ZZZZ 1100. The concen.ra.ot 1100 has ,he same structure as ,ha concentre. 

zzzi, ne t; e ;rii3r c ii3oo a T» 

tree 30O a 1X2 switches. Figure Sb shows me comrol o, the elements 
30A 11 OG bV me data iinKs. Each elemen. 1 ,30MB is controlied by oniy one tin, 
except me gate ma. prevents log . ^ 
20 me gates connecting i, to either pen 4 or, 5 are =n ^ ^ 
^mutations o, 1s ma, many o, me permutations (6 of mam or an (J 
1 have gaps which need to be Cosed up by me ,s balow mam. The 1s to be 
o aTaTchoaen with as simple a ro,e as posaible. Every M that can posses a 
ao a ogic ,o decide whether me parser permuU«on e*an, ,s curren., 
gap has log,c Rising a 0 state ,n 

25 ,r:-m» — «• - ~> ~ wh r:: 

Tool he s below the »nk are swimhed upwards appropha.e,y to m me gap. But* 
ZTJL*-« — (8.8, mis wiU have me added complied ma. more ha 
nTcan have a gap (as de«ned above, wimin me same pa— . a. s 
,n - must be made as to which link should control me raising o, 1s below ,t « 

30 decision must be maoe uppermost link wrth 

rr : rrr. « rr above mu S , a, r: e 
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concentrator example shown in Figure 9a. The numbers in brackets on the links 
indicate the output ports to which the links may need to be switched. Most need to be 
switched only to one other output port, and only link 6 needs to be switched to two 
possible other output ports. With this single-stage merging network, the number of 
5 concentrator stages is again 

S(m,m) = 2log 2 m - 1, and the total number of computing steps is again 
0(N 1/2 log 2 N) for an optimum 3-stage packet switch with n * (N/2) 1 2 . The number of 
logic and switching elements depends on the particular concentrator structure. The 
one described above requires 0(m 2 /8) gates and/or switches in the merging network. 

10 and hence 0(m 2 /8) for a complete concentrator. So, overall there would be 0(N 3 ' 2 ) for 
an optimum 3-stage packet switch with n « (N/2) 1/2 . 

Other concentrator designs are known, e.g. Ted H. Szymanski, "Design 
principles for practical self-routing nonblocking switching networks with O(N.logN) bit- 
complexity," IEEE Trans. On Computers, vol.46, no. 10, 1057-1069 (1997) and Joseph 

15 Y Hui, Switching and Traffic Theory for Integrated Broadband Networks, Kluwer 
Academic Publishers, 1990, Chapter 4) and these concentrator structures may also 
be used with the above parallel path search algorithm to achieve the same number of 
computing steps, i.e. 0(N 1/2 log 2 N). 

Connection requests in a 3-stage switch are conventionally handled 

20 sequentially (see Figure 10). Before path-searching is performed on a new connection 
request it is first established whether the output port is free, and willing to accept the 
connection. This can simply be achieved by interrogating the status of the desired 
output port. Since there are N such ports, this needs only a decoder of log 2 N stages. 
For N sequential connection requests, this would take 0(Nlog 2 N) steps. There 

25 follows a possible implementation for minimising the number of processing steps 
needed to ensure no overbooking when all N requests are processed in parallel, as in 
the method of the present invention. 

Figure 11 shows an apparatus for determining the address of available 
middle-stage switches. The decoder comprises a number of shift registers 1310, a 

30 decoder 1320 and a array 1330 containing a list of output port statuses. If each 
output port is represented by a bit of information representing its status (free or 
connected), then a simple list of N bits is needed. To establish whether a requested 
output port is free, one simply has to switch a request bit to the desired memory 
location in the list, and if it is free, allow the request bit through to provide a time 
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connections using those middle-stage switches. Overall there are N/n iterations and 
in each one, the matrix and list of each first-stage switch are paired with the list of one 
third-stage switch, such that any first- or third-stage switch is involved in only one 
pairing. 

5 Figure 12 shows the structure of each of the N/n processors 1400 that are 

required. Each processor 1400 comprises a first stage connection request matrix 
1410, a first stage status list 1420, a third stage status list 1430, a first array of AND 
gates 1440, a plurality of shift registers 1450, a first concentrator 1460, a second 
concentrator 1465 and a second array of AND gates 1445. The first stage connection 

10 request matrix 1410 is connected to the first concentrator 1460. The outputs of the 
first stage status list 1420 and the third stage status list 1430 are logically combined in 
the first array of AND gates 1440, which has m AND gates. The first element of the 
first stage status list 1420 is combined with the first element of the third stage status 
list 1430 in the first AND gate of the array 1440, and so on for all m elements of the 

15 first stage status list and the third stage status list and the array of AND gates. The 
outputs of the AND gate array 1440 are concentrated using second concentrator 1465 
such that all common, free middle-stage switches are connected to at most m 
adjacent links 1475. At the same time the contents of the first stage connection 
request matrix 1410, i.e. the first-stage connection requests wishing to connect 

20 between that particular pair of first- and third-stage switches, are passed through the 
first concentrator 1460, such that the connection requests can be concentrated onto 
at most n adjacent links 1470. Both of the concentrators 1460 & 1465 can operate bi- 
directionally, or possess two sets of switches/gates, in order to provide two separate 
contra-directional paths. Both of the concentrators are simultaneously concentrating 

25 another of the pairings. 

The second array of AND gates 1445, comprising n AND gates, enables the 
required number of connection requests (up to n) to allow address information to pass 
in each direction via switches or gates 1490 as follows, while the established routes 
through the concentrators are held in place. The first-stage connection request 

30 addresses are routed to the first- and third-stage status lists, where they are stored in 
the appropriate middle-stage switch locations. The log 2 N address bits can be 
transmitted sequentially, i.e. pipelined. The state of one additional bit location can be 
changed to signify the seizure of that middle-stage switch from the first-stage switch. 
In the opposite direction, preferably simultaneously, the addresses of the common, 
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free middle-stage switches are routed through to the first-stage connection request 
locations and stored there. The middle-stage switch addresses can be s.mply 
generated by simu.taneously clocking them sequentially from the respective one of 

the plurality of shift registers 1450. 

5 The first-stage and third-stage switch pairings can now be cycled to the next 

set of pairings. This can be simply achieved by transmitting the first-stage matrix and 
,ist to the next adjacent processor. Alternatively, transmission requirements would be 
,ower if instead only the third-stage list were transmitted to the adjacent processor. 
The first processor would transmit its third-stage list to the N/n-1th processor. The 

10 iterations continue until all first-stage and third-stage switches have been pa.red 
together At the end of this first path-searching step, all first- and third-stage sw,tches 
have sufficient information to determine which of their output ports to connect to each 
of their input ports. 

Assuming that the concentrators have the structure described by Szymansk., 
15 op cit each status list concentrator has 0(3.og 2 m) stages and each connection 
request concentrator has 0(3.og 2 N) stages. For all N/n iterations, therefore, the 
middle-stage switch addresses take 0((N/n)(3log 2 m ♦ 3log 2 N + ,og 2 m)) steps to be 
completed, and the connection request addresses take 0((N/n)(3.og 2 m ♦ 4log 2 N)) 
steps The latter is the larger, and therefore represents the number of computing 
20 steps needed to allocate middle-stage switches to all connections in the first- and 

third-stage switches. 

in the second step of the path-search algorithm, the 1xN/n connect.cn matnx 
for each middle-stage switch is computed. This is very simple, because the first-stage 
status lists already contain all the information. Using N links between the first-stage 

25 lists and a set of middle-stage connection matrix data memories, all destinat.on port 
address bits can be transmitted sequentially in only log 2 N steps, which is much faster 
than the first step. Of course this could be slowed down, if desired, by us.ng fewer 
,inks with more steps. The middle-stage switches now have all address informal 
necessary to connect their input ports to their output ports. 

30 It is also possible to compute up to n separate connections for any Aj/Cj pa.r 

of first and third stage switches by using a single processor to interrogate the status of 
each common, middle-stage switch in turn (m steps to go through all possib.e middle- 
stage switches). Although the overall number of computing steps is increased to O(N) 
again, the number of processors required is greatly reduced to just N/n, i.e. OfN^). 
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The advantage of the present invention is that the fastest parallel algorithm found for 
ensuring no overbooking in a 3-stage packet switch takes O(N) time using 0(Nlog 2 N) 
components, which is a lower component count than the existing :; probe- 
acknowledge-send" method of resolving output port conflicts which requires fewer 
5 steps (0(log 2 2 N)) f but uses more components 0(Nlog 2 2 N). 

A range of computing times have been found for solving the path-searching 
problem, the fastest of which takes sub-linear time 0(N 1/2 log 2 N) using 0(Nlog 2 N) 
components by employing a new, highly parallel processing algorithm making use of 
existing multi-stage concentrator designs. Although the known parallel looping 

10 algorithm is potentially faster, requiring either 0(log 2 3 N) computing steps using the 
same number 0(Nlog 2 N) of components s or 0(log 2 2 N) computing steps using 0(N 2 ) 
components, the use of Szymanski's self-routeing switch structure in the former may 
only be asymptotically useful for large N, and full interconnection in the latter may 
require far too much hardware. 

15 The present invention provides a method whereby third stage switches can 

be identified in order to route a packet from a desired input port to a desired output 
port. It should be understood that the method of the present invention is compatible 
with all manner of switch fabrics and methods of avoiding fabric contention. However, 
" it is preferred that the fabric contention avoidance method described in the applicant's 

20 co-pending United Kingdom patent application GB0006084.8 (the contents of which 
are hereby incorporated by reference) is implemented. 

It will be understood that the algorithms and structures described above are 
equally applicable to either optical, electronic or opto-electronic switches. The person 
skilled in the art will readily appreciate that the invention also includes different 

25 combinations of Boolean logic and logic elements that achieve the same, or similar 
results, as those described above, e.g. replacing AND gates with NAND gates and 
changing the '0' and '1' signals appropriately, etc.. Although the above description 
has assumed that the three-stage switch is strictly non-blocking, i.e. m=2n-1, the 
invention is still applicable to those three stage switches having lower values of m. 

30 
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CLAIMS 

1 A method of identifying available time slot(s) in a packet switch in order to 
route a packet from an input port to a designated output port, the method composing 

5 the steps of; . 

( a) logically combining the status of each time slot with regard to the input 
port and with regard to the output port to generate the status of the time slot with 
regard to the given input port-output port pair; 

(b ) logically combining in pairs the status of input port-output port pairs to 
1 0 determine whether one of the input port-output port pairs is available; and 

( c) repeating step (b) iteratively until one available input port-output port 
has been identified. 

2 A method of identifying available time slot(s) in a packet switch in order to 
15 route a packet from an input port to a designated output port, the method comprising 
the steps of; 

(a) logically combining the status of each time slot with regard to the input 
port and with regard to the output port to generate the status of the time slot with 
regard to the given input port-output port pair, and 
20 " (b) processing the inlet-outlet pair status information generated in step (a) 
in a logical concentrator so that the available input port-output port pair(s) are ordered 
hierarchically. 

3. A method of allocating switch requests within a packet switch, the method 

25 comprising the steps of 

(a) establishing switch request data at each input port; 

( b) processing the switch request data for each input port to generate 
request data for each input port-output port pairing; 

(c) comparing the number of requests from each input port and to each 
30 output port with the maximum request capacity of each input port and each output 

port; and 

(d) allocating all requests for those input-output pairs where the total 
number of requests is less than or equal to the maximum request capacity of each 
input port and each output port; 
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(e) reducing the number of requests for those input-output pairs where the 
total number of requests is greater than the maximum request capacity of each input 
port and each output port such that the number of requests is less than or equal to the 
maximum request capacity of each input port and each output port; and 
5 (f) allocating the remaining requests. 

4. A method of allocating switch requests within a packet switch, the method 
comprising the steps of; 

(a) establishing switch request data at each input port; 

10 (b) processing the switch request data for each input port to generate 

request data for each input port-output port pairing; 

(c) allocating a first switch request from each of the input port-output port 

pairing request data, the requests being allocated only if the maximum request 

capacity of the respective output port has not been reached; and 
15 (d) allocating further switch requests by the iterative application of step (c) 

until the maximum request capacity of each output port has been reached. 

5. A method of allocating switch requests within a packet switch, the method 
comprising the steps of; 

20 (a) establishing switch request data at each input port; 

(b) processing the switch request data for each input port to generate 
request data for each input port-output port pairing; 

(c) identifying a first switch request from each of the input port-output port 
pairing request data; 

25 (d) identifying further switch requests by the iterative application of step (c) 

until all of the switch request data has been identified; 

(e) subject to the maximum request capacity of each input port and each 
output port, allocating all of the identified switch requests; and 

(f) reserving unallocated switch requests for use in the next phase of 
30 switch request allocation. 

6. A method of packet switching in which available time slots are identified 
according to the method of either claim 1 or claim 2. 
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7. A method of packet switching in which switch requests are allocated 
according to the method of any of claims 3 to 5. 

8. a method of packet switching in which; 

5 switch requests are allocated according to claim 7; 

available time slots are identified according to claim 6; and 

anocated switch requests are performed using identified available time slots. 

9 . A packet switch which switches packets according to the methods of any of 
10 claims 6 to 8. 
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